# Here is a problem taken from the "real world".

# simplified approach...

# In order to decide which of two colors to use on a
# company web site the design team calls together 100
# people.  They divide the people into two 50 person groups. 
# The first group is shown color one and the second group is 
# shown color two.  Each group is asked to rate the 
# color they have been shown on a scale of 0 to 6 where
# 0 means they dislike the color and 6 means that they
# really like the color.  The first group has a mean score
# of 4.8 while the second group has a mean score of 4.6.
# Some in the design team say "The first color is significantly
# better than the second color."  Others in the design 
# team say "The scores are too close to make that conclusion."

# What can we say about this, from a statistical perspective?

# First, the sampling is questionable.  We do not know if
# the 100 people are in any way representative of people
# who will use the web site.  
#
# After that, we could form a
# null hypothesis that the mean scores for the web site 
# users, if we could find and query all of them, would 
# be the same value for the two colors.  
# Then our alternative is that the
# mean score for all web site users would be higher for
# the first color than it would be for the second color.
#
# Then we would turn to our t-test for the equality of 
# two means, hypoth_2test_unknown(). However,
# to use that function we need to know the desired level
# of significance and we need to know the standard 
# deviation of the two 50-person sample scores.  
#
# Of these
# the former is easy.  We will run the test at the 0.05 
# level of significance.  
#
# What about the latter issue: 
# knowing the standard deviations? Let us see how those
# sample standard deviations might affect the analysis.




# First generate two samples, one with mean 4.8 and
# the other with mean 4.6.  Do this to get a large 
# standard deviation in each by taking almost all 5's and 
# then including a few 1's and maybe a 3

first <- rep(5,50)
first[1]<-1
first[2]<-1
first[3]<-3
first  #look at the first
mean( first)
sd(first)
#
# Then, create a second sample, starting from the first
# and then changing just 3 values so that we alter the mean
# value of that second sample
second <- first
second[4]<-1
second[5]<-1
second[6]<-3
second  # look at the second

mean(second)
sd(second)

#  Now do a two population test with the null
#  hypothesis that the two means are the same versus
#  the alternative that the first mean is greater 
#  than the second.
#
source("../hypo_2unknown.R")
hypoth_2test_unknown(sd(first), 50, mean( first),
                     sd(second), 50, mean(second),
                     1, 0.05)
#  the full Attd value of 0.16 means that if the null
#  hypothesis is true then we would get samples with
#  these kind of differences, or more, in the means about 
#  1/6 of repeated samples of this size.
#  Therefore, we do not have enough evidence
#  in these two samples to reject the null hypothesis.

#  Note that full Attd is meant as the attained 
#  or achieved significance using the full degrees of 
#  freedom.

# Now, do this again, but in samples called third and fourth.
# For these, however we will
# make the standard deviation small by just using scores 
# of 5's and 4's.
                   
third <- c(rep(5,40),rep(4,10))
third # look at third

mean( third )
sd(third)


fourth<- c(rep(5,30),rep(4,20))
fourth #look at fourth
mean(fourth)
sd(fourth)

#  Now run the same test, but with the new samples
hypoth_2test_unknown(sd(third), 50, mean( third),
                     sd(fourth), 50, mean(fourth),
                     1, 0.05)
#  the full Attd value of 0.0146 tells us that if the
#  null hypothesis were true then we would get two samples
#  showing this difference, or one more extreme and this, 
#  in about 1.46% of the samples.  That is too rare!
#  Therefore, we would reject the null hypothesis at 
#  the 0.05 level, in favor of the alternative which says
#  that the mean of the third is higher than is the mean
#  of the fourth.


#  Now let us look at the real data.  First we can read in 
#  all of the data.  
clean_data <- read.csv("josh_real_clean.csv")
# then look at it.
clean_data
str( clean_data )

# The two groups that we are examining ar the FB color
# (items 1-36 and 109-122) and the GB color (items 37-71 
# and 123-137).  Those values represent the responses
# from the first group (all 50 people) and the second
# group (another 50 people).  

# However, looking at the data we have a real problem.
# The $Confidence score is the result of asking "How confident 
# are you based on the color?" The $Unease score is the 
# result of asking "How uneasy are you based on the color?"
# These are opposite readings.  The more "confident" you are
# the less "uneasy" you should be.  If you respond with a 6
# for both questions then it is clear that you are not being
# truthful.  You are just marking down answers.

# The same is true for $Trust and $Untrust.  Items 109-137
# are clearly responses that are so conflicted that they are
# meaningless.  We should ignore them

# We are only interested in the values where the useif.0 
# value is 0, and then we only want to look at the 
# Confidence scores.

#  The first group we want is the FB items, 1 through 36
clean_1 <- clean_data$Confidence[1:36]
mean( clean_1 )
sd( clean_1 )
# the second group we want is the GB items, 37 through 71
clean_2 <-clean_data$Confidence[37:71] 
mean(clean_2)
sd(clean_2)
#  Now we can run the test
hypoth_2test_unknown(sd(clean_1), length(clean_1), mean( clean_1),
                     sd(clean_2), length(clean_2), mean(clean_2),
                     1, 0.05)
# Based on the sample that we have, excluding clearly 
# bad data and using the resulting means and standard 
# deviations, the result is that we do not have evidence to
# reject the null hypothesis of "no difference between colors"
# in favor of the alternative "the FB color is better than
# the GB color" at the 0.05 level of significance.



######### the real case

#  It turns out there was a third color tested, the HB's.
#  Let us pull out those good items, 72 through 108.

#  What if we compare the FB and HB scores on confidence.
clean_3 <-clean_data$Confidence[72:108] 
mean(clean_3)
sd(clean_3)
hypoth_2test_unknown(sd(clean_1), length(clean_1), mean( clean_1),
                     sd(clean_3), length(clean_3), mean(clean_3),
                     1, 0.05)
# In this comparison, with an attained level of 0.0115, we would
# have significant evidence, at the 0.05 level, to reject the
# idea that the two colors, FB and HB are viewed in the same
# way in favor of the hypothesis that the FB color 
# engenders more confidence than does the HB color.

#  We could go on to test GB and HB but even taking the step above 
#  should not have been done.  What we really want to be able 
#  to say is are all the colors the same, or is there some
#  difference.  We should not be looking at different pairs,
#  FB vs. GB, FB vs HB, and then GB vs HB.  The appropriate
#  test is called an ANalysis Of VAriance, or ANOVA, and
#  at this time that is beyond the material of this course.